Getting and analyzing remotely sensed open data#

2023.05.23

Our last tutorial will show you how to navigate through a STAC catalog, search within a cloud-hosted data set, and acquire some popular open satellite data sets.

Goals#

  1. Use Intake-STAC to browse a STAC catalog and load the data

  2. Make a simple animation showing the time lapse of the Taoyuan international airport viewed from space (Self-guided)

Installation#

We will use Intake, an interface for loading data into the scientific Python ecosystem. Here we install intake-stac, an intake-based library for loading STAC catalogs. (Intake itself will be installed along with this library because of the dependency.) In addtion to that, we will need hvplot again to visualize our data.

Pystac-client is necessary for Goal 2.

!pip install intake-stac hvplot rioxarray xarray==2022.12.0 pystac-client

Goal 1 procedure#

STAC#

STAC logo by the STAC team, CC-BY 4.0.

SpatioTemporal Asset Catalogs (STAC) are a set of specification for describing spatialtemporal data sets, such as satellite images and spatial vector data. Satellite data sets described by the language of STAC can be more accessible because users will not spend time learning how to search and access data again and again (which is typical for the current commercial data sets). The first version of STAC was released in May 2021 and became popular within major satellite data providers. You can also check out this blogpost by the Planet Inc. for more details.

The full specification of STAC as well as numerous tutorials are available on the STAC website. Also, on the STAC intex website you can find a collection of STAC catalogs to explore.

Intake#

Intake logo by the Intake dev team, BSD-2-Clause license.

Intake is a Python-based tool for interacting and accessing local and cloud data. With the STAC plugin installed (Intake-STAC), we can use Intake to navigate through a STAC catalog, perform basic search, and load the desired data into the memory for further analysis. Intake also provides Jupyter-based GUI for querying/describing data using proper visualization.

The Planet disaster data#

In this tutorial, we will use the Planet disaster data, a small data set that Planet Inc. prepared for showing how STAC works with the data analysis workflow. We will be looking at Hurricane Harvey, one of the most devastating natural disasters in the U.S. Here we will get the data acquired by the PlanetScope satellite constellation over Houston, TX, U.S. on August 31, 2017, a few days after the hurricane brought the record-breaking rainfall. The base URL (i.e., landing page) of this STAC catalog is at https://www.planet.com/data/stac/catalog.json, and you can try to open it on a web browser to see what will happen.

STAC browser#

In fact, there is a better way to visually check this STAC catalog than opening it as a plain text JSON file on the web browser. Try the STAC brower!

STAC browser

Workflow#

First, let’s import the necessary modules:

import intake
import hvplot.xarray

Now we can use Intake to open the STAC cataog:

catalog_url = "https://www.planet.com/data/stac/"    #   "https://www.planet.com/data/stac/catalog.json" will also work. 
catalog = intake.open_stac_catalog(catalog_url)
catalog
planet:
  args:
    stac_obj: https://www.planet.com/data/stac/
  description: ''
  driver: intake_stac.catalog.StacCatalog
  metadata:
    description: This catalog serves as the main root STAC Catalog for Planet Labs.
      It links to 4 small static catalogs of open data, including a small set of Planet
      Disaster Data, some CC-BY SkySat collects and the complete Spacenet 7 images
      and labels.
    id: planet
    stac_extensions:
    - https://stac-extensions.github.io/stats/v0.2.0/schema.json
    stac_version: 1.0.0
    stats:catalogs:
      count: 9
      versions:
        1.0.0: 9
    stats:collections:
      count: 7
      versions:
        1.0.0: 7
    stats:items:
      assets:
        application/geo+json: 1423
        application/json: 26
        image/jpeg: 1
        image/png: 13
        image/tiff: 2
        image/tiff; application=geotiff: 2253
        image/tiff; application=geotiff; profile=cloud-optimized: 425
        text/xml: 3
      count: 3708
      extensions:
        https://stac-extensions.github.io/eo/v1.0.0/schema.json: 396
        https://stac-extensions.github.io/label/v1.0.0/schema.json: 1423
        https://stac-extensions.github.io/projection/v1.0.0/schema.json: 385
        https://stac-extensions.github.io/raster/v1.0.0/schema.json: 3
        https://stac-extensions.github.io/raster/v1.1.0/schema.json: 371
        https://stac-extensions.github.io/view/v1.0.0/schema.json: 31
      versions:
        1.0.0: 3708
    title: Planet Root STAC Catalog
    type: Catalog

We see lots of information here, which you can also check on the STAC browser. To see what collections this catalog contains, use this method:

list(catalog)
['planet-stac-skysat',
 'planet-disaster-data',
 'sn7',
 'planet/fusion/14N/29E-188N',
 'education']

We want to access the Planet disaster data, so we go into one layer deep and get the corresponding STAC collection:

collection = catalog['planet-disaster-data']
collection
planet-disaster-data:
  args:
    stac_obj: https://www.planet.com/data/stac/disasters/collection.json
  description: '[Planet Disaster Data](https://www.planet.com/disasterdata/) makes
    imagery available directly to the public, volunteers, humanitarian organizations,
    and other coordinating bodies in support of the International Charter for Space
    and Major Disasters. Data is released for individual disaster events, providing
    a 30 day window pre- and post-disaster. Imagery is provided under Creative Commons
    licenses, free of charge, with either CC-BY-SA or CC-BY-NC. This STAC catalog
    provides fully public access of a very small subset of the imagery, as a proof
    of concept.'
  driver: intake_stac.catalog.StacCollection
  metadata:
    catalog_dir: ''

All the metadata about this asset are available at:

collection.metadata
{'type': 'Collection',
 'id': 'planet-disaster-data',
 'stac_version': '1.0.0',
 'description': '[Planet Disaster Data](https://www.planet.com/disasterdata/) makes imagery available directly to the public, volunteers, humanitarian organizations, and other coordinating bodies in support of the International Charter for Space and Major Disasters. Data is released for individual disaster events, providing a 30 day window pre- and post-disaster. Imagery is provided under Creative Commons licenses, free of charge, with either CC-BY-SA or CC-BY-NC. This STAC catalog provides fully public access of a very small subset of the imagery, as a proof of concept.',
 'stac_extensions': [],
 'title': 'Planet Disaster Data',
 'keywords': ['disaster', 'open'],
 'summaries': {'platform': ['SS02', 'SSC1', '101c'],
  'constellation': ['skysat', 'planetscope'],
  'eo:cloud_cover': {'minimum': 0, 'maximum': 24},
  'eo:gsd': {'minimum': 0.9, 'maximum': 3.7},
  'view:off_nadir': {'minimum': 0.2, 'maximum': 27.3},
  'view:sun_azimuth': {'minimum': 122, 'maximum': 231.9},
  'view:sun_elevation': {'minimum': 56.3, 'maximum': 65.1}},
 'providers': [{'name': 'Planet',
   'description': 'Contact Planet at https://www.planet.com/contact-sales/',
   'roles': ['producer', 'processor'],
   'url': 'https://www.planet.com'},
  {'name': 'Planet Disaster Team <disaster-team@planet.com>',
   'description': 'The Planet Disaster Data Team has released this data as CC-BY-SA-4.0 to help disaster response',
   'roles': ['licensor'],
   'url': 'https://www.planet.com/disasterdata/'},
  {'name': 'Google Cloud Platform',
   'description': 'Hosting is done on a GCP storage bucket owned by the Planet Disaster Data team',
   'roles': ['host'],
   'url': 'https://storage.googleapis.com/pdd-stac/'}],
 'extent': {'spatial': {'bbox': [[-96, 28, -93, 31]]},
  'temporal': {'interval': [['2017-08-28T10:00:00Z', None]]}},
 'license': 'CC-BY-SA-4.0'}

We can repeat the use of list to get a few layers deeper, and eventually we will get a STAC item that points to the desired satellite image:

item = collection['hurricane-harvey']['hurricane-harvey-0831']['Houston-East-20170831-103f-100d-0f4f-RGB']
item
mosaic:
  args:
    chunks: {}
    urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
  description: 3 Band RGB Mosaic
  direct_access: allow
  driver: intake_xarray.raster.RasterIOSource
  metadata:
    href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
    plots:
      geotiff:
        cmap: viridis
        data_aspect: 1
        dynamic: true
        frame_width: 500
        kind: image
        rasterize: true
        x: x
        y: y
    roles:
    - data
    title: 3 Band RGB Mosaic
    type: image/tiff; application=geotiff; profile=cloud-optimized
thumbnail:
  args:
    chunks: {}
    urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
  description: Thumbnail
  direct_access: allow
  driver: intake_xarray.image.ImageSource
  metadata:
    href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
    plots:
      thumbnail:
        bands: channel
        data_aspect: 1
        flip_yaxis: true
        kind: rgb
        x: x
        xaxis: false
        y: y
        yaxis: false
    roles:
    - thumbnail
    title: Thumbnail
    type: image/png

You can see this asset has two different scenes, and one of them is the thumbnail, also known as the browse image:

thumbnail = item['thumbnail']

Now you can try to enter thumbnail or thumbnail.describe() for additional information about this image! To get the actual image, you can either download it from the url,

thumbnail.urlpath
'https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png'

or use the to_dask() method to lazily stream it as a Dask array.

da_thumbnail = thumbnail.to_dask()
da_thumbnail
<xarray.DataArray (y: 552, x: 549, channel: 3)>
dask.array<xarray-<this-array>, shape=(552, 549, 3), dtype=uint8, chunksize=(552, 549, 3), chunktype=numpy.ndarray>
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 543 544 545 546 547 548 549 550 551
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 540 541 542 543 544 545 546 547 548
  * channel  (channel) int64 0 1 2
da_thumbnail.hvplot.image(x='x', y='y', data_aspect=1)

Now let’s explore the high-resolution mosaic data using the visualization steps we saw in the last tutorial!

mosaic = item['mosaic']
da_mosaic = mosaic.to_dask()
da_mosaic
<xarray.DataArray (band: 3, y: 22094, x: 21984)>
dask.array<open_rasterio-501c37a1d96e315c20253204d9aec1ce<this-array>, shape=(3, 22094, 21984), dtype=uint8, chunksize=(3, 22094, 21984), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 1 2 3
  * y        (y) float64 3.524e+06 3.524e+06 3.524e+06 ... 3.447e+06 3.447e+06
  * x        (x) float64 -1.066e+07 -1.066e+07 ... -1.058e+07 -1.058e+07
Attributes: (12/13)
    transform:               (3.4638558402435815, 0.0, -10657435.586420376, 0...
    crs:                     +init=epsg:3857
    res:                     (3.4638558402435815, 3.4638558402435815)
    is_tiled:                1
    nodatavals:              (nan, nan, nan)
    scales:                  (1.0, 1.0, 1.0)
    ...                      ...
    AREA_OR_POINT:           Area
    TIFFTAG_DATETIME:        2017:09:01 15:10:49
    TIFFTAG_RESOLUTIONUNIT:  2 (pixels/inch)
    TIFFTAG_SOFTWARE:        Adobe Photoshop CC 2015.5 (Macintosh)
    TIFFTAG_XRESOLUTION:     72
    TIFFTAG_YRESOLUTION:     72
## Slice a portion of the image and plot a RGB image of that slice.
da_mosaic_subset = da_mosaic.sel(x=slice(-10641000, -10638000), y=slice(3474000, 3471000))
da_mosaic_subset.hvplot.rgb(x='x', y='y', data_aspect=1)

Question for you#

Can you find where in Houston is affected by Hurricane Harvey?

Goal 2 procedure#

In the latter part of the tutorial, we will show how to use PySTAC_Client to perform a simple query over a STAC catalog. Here we will use the Sentinel-2 Cloud-Optimized GeoTIFFs STAC catalog, hosted on AWS by Element 84. The image we used in the previous tutorial also comes from this STAC dataset, but this time we want to get more.

Note

Not all STAC catalogs can be queried. This is an optional feature for the data provider to determine whether to adopt. Thankfully, the popular Landsat and Sentinel catalogs on the AWS open data repository all support STAC querying.

PySTAC client#

PySTAC client provides basic interface for working with STAC catalogs. Intake-STAC partially builds on top of it.

Workflow#

Let’s import the necessary modules first:

import xarray as xr
import pandas as pd
import pystac_client
import numpy as np

Now we pass the STAC catalog URL to PySTAC and do the data query:

# If you are curious about the content of this STAC catalog, don't forget the STAC browser!
catalog_url = "https://earth-search.aws.element84.com/v1"
catalog = pystac_client.Client.open(catalog_url)

results = catalog.search(
    collections=["sentinel-2-l2a"],              # Search within the sentinel-2-l2a collection
    bbox = [121.21, 25.06, 121.26, 25.09],       # Bounding box; [lon-left, lat-bottom, lon-right, lat-top]. Roughly focused at the Taoyuan intl airport.
    datetime="2021-01-01/2023-05-22")            # Search within this time span.

items = results.get_all_items()

print(len(items))         # This shows how many items are available.
169

Next, we pass items to Intake for the following analysis.

items_intake = intake.open_stac_item_collection(items)
# list(items_intake)                                       # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A']                 # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A'].metadata        # Try this!

This time, each item has many images at our hands. Take S2B_51RUH_20230311_0_L2A (the image we used in the previous tutorial) as an example:

item = items_intake['S2B_51RUH_20230311_0_L2A']
for key in item.keys():
    print(key)
aot
blue
coastal
granule_metadata
green
nir
nir08
nir09
red
rededge1
rededge2
rededge3
scl
swir16
swir22
thumbnail
tileinfo_metadata
visual
wvp
aot-jp2
blue-jp2
coastal-jp2
green-jp2
nir-jp2
nir08-jp2
nir09-jp2
red-jp2
rededge1-jp2
rededge2-jp2
rededge3-jp2
scl-jp2
swir16-jp2
swir22-jp2
visual-jp2
wvp-jp2

Each entry listed above is a scene you can access in this asset. For example, green means the image observed at the green wavelength, and nir means the image observed at the near infrared. Here we will retrieve visual, the 3-band true-color combination, from every image in the query result.

This might take a few minutes depending on the web connection and the CPU processing speed. If it gets too long, try to change the search criteria and reduce the number of returned items.

# Initialize an empty collection
data_list = []
timestamp_list = []

# Loop over each queried image
for scene in items_intake.values():
    timestamp = scene.metadata['s2:generation_time']                          # get time stame of each image
    da = scene['visual'].to_dask()                                            # lazily load the data as Dask array
    da_subset = da.sel(x=slice(320000, 325000), y=slice(2777000, 2772000))    # Slice the image near the airport
    # Now we apply a simple filter to mask our images with cloud cover > 10%. 
    if np.sum(da_subset.values[0, :, :] < 255) / 250000 > 0.9:                # 250000 stands for the total number of the pixels in the sliced data. 255 is a saturated band color.
        timestamp_list.append(timestamp)
        data_list.append(da_subset)

Now we can merge all the data into a giant 4-D data cube!

datacube = xr.concat(data_list, pd.Index(timestamp_list, name="t"))
datacube
<xarray.DataArray (t: 38, band: 3, y: 500, x: 500)>
dask.array<concatenate, shape=(38, 3, 500, 500), dtype=uint8, chunksize=(1, 3, 500, 500), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 1 2 3
  * y        (y) float64 2.777e+06 2.777e+06 2.777e+06 ... 2.772e+06 2.772e+06
  * x        (x) float64 3.2e+05 3.2e+05 3.2e+05 ... 3.25e+05 3.25e+05 3.25e+05
  * t        (t) object '2023-03-16T06:55:01.000000Z' ... '2021-01-15T05:16:3...
Attributes:
    transform:           (10.0, 0.0, 300000.0, 0.0, -10.0, 2800020.0)
    crs:                 +init=epsg:32651
    res:                 (10.0, 10.0)
    is_tiled:            1
    nodatavals:          (0.0, 0.0, 0.0)
    scales:              (1.0, 1.0, 1.0)
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    OVR_RESAMPLING_ALG:  AVERAGE

Finally, let’s display the data using Hvplot’s animation widget. Have fun exploring the data using the display buttons!

datacube.hvplot.rgb(
    groupby="t",  # adds a widget that switches the view along the t axis.
    data_aspect=1,
    widget_type="scrubber",
    widget_location="bottom",
)

Question for you#

Can you identify any changes of landscape around the Taoyuan international airport between 2021 and 2023?